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ABSTRACT 

Eigenvalues and eigenfunctions of linear operators are important to many areas of ap- 
plied mathematics. The ability to approximate these quantities numerically is becoming 
increasingly important in a wide variety of applications. This increasing demand has fu- 
eled interest in the development of new methods and software for the numerical solution 
of large-scale algebraic eigenvalue problems. In turn, the existence of these new methods 
and software, along with the dramatically increased computational capabilities now avail- 
able, has enabled the solution of problems that would not even have been posed five or ten 
years ago. Until very recently, software for large-scale nonsymmetric problems was virtually 
non-existent. Fortunately, the situation is improving rapidly. 

The purpose of this article is to provide an overview of the numerical solution of large- 
scale algebraic eigenvalue problems. The focus will be on a class of methods called Krylov 
subspace projection methods. The well-known Lanczos method is the premier member of 
this class. The Arnoldi method generalizes the Lanczos method to the nonsymmetric case. 
A recently developed variant of the Arnoldi/Lanczos scheme called the Implicitly Restarted 
Arnoldi Method is presented here in some depth. This method is highlighted because of its 
suitability as a basis for software development. 


J This work was supported in part by NAS1-19480 while the author was in residence at the Institute for 
Computer Applications in Science and Engineering (ICASE), NASA Langley Research Center, Hampton, 
VA 23681-0001. 
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1. Introduction 

Discussion begins with a brief synopsis of the theory and the basic iterations suitable 
for large-scale problems to motivate the introduction of Krylov subspaces. Then the Lanc- 
zos/ Arnold i factorization is introduced, along with a discussion of its important approx- 
imation properties. Spectral transformations are presented as a means to improve these 
approximation properties and to enhance convergence of the basic methods. Restarting is 
introduced as a way to overcome intractable storage and computational requirements in the 
original Arnoldi method. Implicit restarting is a new sophisticated variant of restarting. 
This new technique may be viewed as a truncated form of the powerful implicitly shifted 
QR technique that is suitable for large-scale problems. Implicit restarting provides a means 
to approximate a few eigenvalues with user specified properties in space proportional to nk, 
where k is the number of eigenvalues sought, and n is the problem size. 

Generalized eigenvalue problems are discussed in some detail. They arise naturally in 
PDE applications and they have a number of subtleties with respect to numerically stable 
implementation of spectral transformations. 

Software issues and considerations for implementation on vector and parallel computers 
are introduced in the later sections. Implicit restarting has provided a means to develop 
very robust and efficient software for a wide variety of large-scale eigenproblems. A public 
domain software package called ARPACK has been developed in Fortran 77. This package 
has performed well on workstations, parallel-vector supercomputers, distributed-memory 
parallel systems and clusters of workstations. The features of this package along with some 
applications and performance indicators occupy the final section of this paper. 


2. Eigenvalues, Power Iterations, and Spectral Transformations 

A brief discussion of the mathematical structure of the eigenvalue problem is necessary 
to fix notation and introduce ideas that lead to an understanding of the behavior, strengths 
and limitations of the algorithms. In this discussion, the real and complex number fields are 
denoted by R and C, respectively. The standard n- dimensional real and complex vectors 
are denoted by R” and C n and the symbols R mxn and C mxn will denote the real and 
complex matrices m rows and n columns. Scalars are denoted by lower case Greek letters, 
vectors are denoted by lower case Latin letters and matrices by capital Latin letters. The 
transpose of a matrix A is denoted by A T and the conjugate- transpose by A H . The symbol, 
|| - || will denote the Euclidean or 2-norm of a vector. The standard basis of C” is denoted 
by the set { ej}j =1 . 

The set of numbers a(A) = {A 6 C : rank(A - XI) < n )} is called the spectrum of A. 
The elements of this discrete set are the eigenvalues of A and they may be characterized as 
the n roots of the characteristic polynomial Pa(X) = det(XI — A). Corresponding to each 
distinct eigenvalue A £ o {A) is at least one nonzero vector x such that Ax = xA. This vector 
is called a right eigenvector of A corresponding to the eigenvalue A. The pair (x, A) is called 
an eigenpair. A nonzero vector y such that y H A = Xy H is called a left eigenvector. The 
multiplicity n a (A) of A as a root of the characteristic polynomial is the algebraic multiplicity 
and the dimension n g (X) of Null) XI - A) is the geometric multiplicity of A. A matrix is 
defective if n 3 (A) < n 0 (A) and otherwise A is nondefective. The eigenvalue A is simple if 
n a (A) - 1. 
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A subspace S of C nxn is called an invariant subspace of A if AS C S. It is straightfor- 
ward to show if A 6 C nXn , X € C nxk and B e C kxk satisfy 


AX = XB, (1) 

then S = Range(X) is an invariant subspace of A. Moreover, if X has full column rank 
k then the columns of X form a basis for this subspace and a (B) C &(A). If k = n then 
<j(B) = a(A ) and A is said to be similar to B under the similarity transformation X. 
A is diagonalizable if it is similar to a diagonal matrix and this property is equivalent to 
A being nondefective. 

An extremely important theorem to the study of numerical algorithms for eigenproblems 
is the Schur decomposition. It states that every square matrix is unitarily similar to an 
upper triangular matrix. In other words, for any linear operator on C n , there is a unitary 
basis in which the operator has an upper triangular matrix representation. 

Theorem 1 (Schur Decomposition). Let A E C nXn . Then there is a unitary matrix Q and 
an upper triangular matrix R such that 

AQ = QR. (2) 

The diagonal elements of R are the eigenvalues of A. 


From the Schur decomposition, the fundamental structure of Hermitian and normal matrices 
is easily exposed: 

Lemma 2 A matrix A E C nxn is normal ( AA H = A H A ) if and only if A — QAQ H with 
q ^ Qnxn un n ar y and A E C nxn diagonal A matrix A E C nxn is Hermitian (A — A H ) if 
and only if A = QAQ H with Q E C nxn unitary and A E R nxn diagonal In either case , the 
diagonal entries of A are the eigenvalues of A and the columns of Q are the corresponding 
eigenvectors. 

The proof follows easily through substitution of the Schur decomposition in place of A in 
each of the defining relationships. The columns of Q are called Schur vectors in general and 
these are eigenvectors of A if and only if A is normal. 

For purposes of algorithmic development this structure is fundamental. In fact, the well 
known Implicitly Shifted QR- Algorithm (Francis, 1961) is designed to produce a sequence 
of unitary similarity transformations Qj that iteratively reduce A to upper triangular form. 
This algorithm begins with an initial unitary similarity transformation V of A to the con- 
densed form AV = VH, where H is upper Hessenberg (tridiagonal in case A = A H ). Then 
the following iteration is performed: 

where Q is unitary and R is upper triangular (i.e., the QR factorization of H — pi ). It 
is easy to see that H is unitarily similar to A throughout the course of this iteration. The 
iteration is continued until the subdiagonal elements of H converge to zero, i.e. until a 
Schur decomposition has been (approximately) obtained. In the standard implicitly shifted 
Qi?-iteration, the unitary matrix Q is never actually formed, it is computed indirectly as 
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Algorithm 1: Implicitly Shifted Q/J-iteration 


Input: (A, V , H ) with AV = VH, V H V = I , H upper Hessenberg; 
For j = 1,2,3,... until convergence, 

(al.l) Select a shift p. ~ Pj 
(al.2) Factor [Q. /?] = qr (H - fil) ; 

(al.3) H - Q h HQ ; V - VQ; 

End-For 


a product of 2 x 2 Givens or 3 x 3 Householder transformations through a “bulge chase" 
process. The elegant details of an efficient and stable implementation would be too much 
of a digression here. They may be found in (Golub and Van Loan, 1983). The convergence 
behavior of this iteration is fascinating. The columns of V converge to Schur vectors at 
various rates. These rates are fundamentally linked to the simple power method and its 
rapidly convergent variant, inverse iteration (see Watkins and Eisner, 1991). 

Despite the extremely fast rate of convergence and the efficient use of storage, the 
implicitly shifted QR method is not suitable for large-scale problems and it has proved to be 
extremely difficult to parallelize. Large-scale problems are typically sparse or structured so 
that a matrix- vector product w <- Av may be computed with time and storage proportional 
to n rather than n 2 . A method based upon full similarity transformations quickly destroys 
this structure. Storage and operation counts become order n 2 . Hence, there is considerable 
motivation for methods that only require matrix-vector products with the original A. 


2.1. Single vector power iterations 

Probably the oldest algorithm for approximating eigenvalues and corresponding eigen- 
vectors of a matrix is the power method. This method is an important tool in its own right 
when conditions are appropriate. It is very simple and only requires matrix- vector products 
along with two vectors of storage. In addition to its role as an algorithm, the method is 
central to the development, understanding, and convergence analysis of all of the iterative 
methods discussed here. 

Algorithm 2: The Power Method 


Input: ( A , Vo ) 

Put V = V 0 /\\v 0 \\vo\ 

For j — 1, 2, 3, ... until convergence , 
(a2.1) w — Av: 

(a2.2) A=^; 

(a2.3) i = i_max (iu); 

(a2.4) v v/(ef w) ; 

End_For 
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At Step (a2.3), i is the index of the element of w with largest absolute value. It is easily 
seen that the contents of v after fc-steps of this iteration will be the vector 


Vk 


^ ej A k v 0 


)A k V 0 = ( 


Pk 

ej A k v 0 


)( — 
Pk 


for any nonzero scalar pk . In particular, this iteration may be analyzed as if the vectors had 
been scaled by pk = A* at each step, with Ai an eigenvalue of A with largest magnitude. 
If A is diagonalizable with eigenpairs {(xj,Aj),l < j < n} and v Q has the expansion 
v 0 — x jlj i n this basis then 

= = X>(V*i)S- (3) 

•*1 A 1 j = 1 j=\ 

If Ai is a simple eigenvalue then 

( 7 — 


It follows that Vk — * x i/(eJ x \), where i = i_max (xi), at a linear rate with a convergence 
factor of | ^ | . 

While the power method is useful, it has two obvious drawbacks. Convergence may be 
arbitrarily slow or may not happen at all. Only one eigenvalue and corresponding vector 
can be found. 


2.2, Spectral transformations 

The basic power iteration may be modified to overcome these difficulties. The most fun- 
damental modification is to employ a spectral transformation. Spectral transformations 
are generally based upon the following: 

Let A € C nxn have an eigenvalue A with corresponding eigenvector x. 

L Let p(r) = 70 + 71 r + 72X 2 + . . . 4- 7^7-^. Then p( A) is an eigenvalue of the matrix 
p(A) = 70/ + 71 A + 72 A 2 + . . . + jkA^ with corresponding eigenvector x (i.e. p(A)x = 
xp( A) ). 

2 . If r(r) = where p and q are polynomials with q{A) nonsingular, define r(A ) = 
[q(A)}~ 1 p(A). Then r(A) is an eigenvalue of r(A) with corresponding eigenvector x. 

It is often possible to construct a polynomial or rational function <f>(r) such that 
|d>( A t )| < |<A(Aj)| for 1 < j < n, j ^ i, 

where A t is an eigenvalue of particular interest. This is called a spectral transformation since 
the eigenvectors of the transformed matrix 4 >(A) remain the same, but the corresponding 
eigenvalues X 3 are transformed to <f>( Aj). Applying the power method with <fi(A) in place 
of A will then produce the eigenvector q = x t corresponding to A* at a linear convergence 
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rate with a convergence factor of |^j| < 1. Once the eigenvector has been found, the 
eigenvalue A = A, may be calculated directly from a Rayleigh Quotient A = q H Aq/q H q. 


2.3. Inverse iteration 

Spectral transformation can lead to dramatic enhancement of the convergence of the 
power method. Polynomial transformations may be applied using only matrix- vector prod- 
ucts. Rational transformations require the solution of linear systems with the transformed 
matrix as the coefficient matrix. The simplest rational transformation turns out to be 
very powerful and is almost exclusively used for this purpose. If p ^ a(A) then A — nl is 
invertible and a{[A - p/] _1 ) = { 1 /(A - p) : A € v(A)} . This transformation is very suc- 
cessful since eigenvalues near the shift p are transformed to extremal eigenvalues which are 
well separated from the other ones while the original extremal eigenvalues are transformed 
near the origin. Hence under this transformation the eigenvector q corresponding to the 
eigenvalue of A that is closest to p may be readily found and the corresponding eigenvalue 
may obtained either through the formula A = 0 + 1/p, where 9 is the eigenvalue of the 
transformed matrix, or it may be calculated directly from a Rayleigh quotient. 


Algorithm 3: The Inverse Power Method 


Input: (A,v 0 ,fi ) 

Put v — i?o/||Vo||co; 

For j = 1,2,3,... until convergence , 

(a3.1) Solve (A — fil)w = r; 

(a3.2) A = #i + ^; 

(a3.3) i = i_max («?); 

(a3.4) v — v/(ef w) ; 

End-For 

Observe that the formula for A at Step (a3.2) is equivalent to forming A = (w H Aw)/(w H w) 
so an additional matrix vector product is not necessary to obtain the Rayleigh quotient esti- 
mate. The analysis of convergence remains entirely in tact. This iteration converges linearly 
with the convergence factor 

Pi - pI 
|a 2 -pI’ 

where the eigenvalues of A have been re-indexed so that |Ai — p| < P 2 — p| < P 3 — P | < 
... < |A n — p|. Hence, the convergence becomes faster as p gets closer to Aj. 

This result is encouraging but still leaves us wondering how to select the shift p to be 
close to the unknown eigenvalue we are trying to compute. In many applications the choice 
is apparent from the requirements of the problem. It is also possible to change the shift 
at each iteration at the expense of a new matrix factorization at each step. An obvious 
choice would be to replace the shift with the current Rayleigh quotient estimate. This 
method, called Rayleigh Quotient (RQ) iteration, has very impressive convergence rates 
indeed. Rayleigh Quotient Iteration converges at a quadratic rate in general and at a cubic 
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rate on Hermitian problems. For a more detailed discussion of the eigenvalue problem and 
basic algorithms see (Golub and Van Loan, 1983, Stewart, 1973, and Wilkinson, 1965). 

3. Krylov Subspaces and Projection Methods 

Although the rate of convergence can be improved to an acceptable level through spec- 
tral transformations, power iterations are only able to find one eigenvector at a time. If 
more vectors are sought, then various deflation techniques (such as orthogonalizing against 
previously converged eigenvectors) and shift strategies must be introduced. One alternative 
is to introduce a block form of the simple power method which is often called subspace iter- 
ation. This important class of algorithms has been developed and investigated in (Stewart, 
1973). Several software efforts have been based upon this approach (Bai and Stewart, 1992, 
Duff and Scott, 1993, and Stewart and Jennings, 1992). However, there is another class 
of algorithms called Krylov subspace projection methods that are based upon the intricate 
structure of the sequence of vectors naturally produced by the power method. 

An examination of the behavior of the power sequence as exposed in equation (3) hints 
that the successive vectors produced by a power iteration may contain considerable infor- 
mation along eigenvector directions corresponding to eigenvalues other than the one with 
largest magnitude. The expansion coefficients of the vectors in the power sequence evolve 
in a very structured way. Therefore, linear combinations of the these vectors might well be 
devised to expose additional eigenvectors. A single vector power iteration simply ignores 
this additional information, but more sophisticated techniques may be employed to extract 
it. 

If one hopes to obtain additional information through various linear combinations of the 
power sequence, it is natural to formally consider the Krylov subspace 

K-k{A,v i) = Span {vi, Av\, A 2 v \, . . .,^'^ 1 } 

and to attempt to formulate the best possible approximations to eigenvectors from this 
subspace. 

It is reasonable to construct approximate eigenpairs from this subspace by imposing a 
Galerkin condition: A vector x £ /C^(A, iq) is called a Ritz vector with corresponding Ritz 
value 6 if the Galerkin condition 

< w, Ax - xO >= 0 , for all w € ICk(A,Vi) 

is satisfied. There are some immediate consequences of this definition: Let W be a matrix 
whose columns form an orthonormal basis for /C* = Kk{A,vi). Let V = WW H denote the 
related orthogonal projector onto /Cjt and define A = V AV — WBW H , where B = W H AW . 
It can be shown that 

Lemma 3 For the quantities defined above: 

1. (x,6) is a Ritz-pair if and only if x = Wy with By = yO . 

||(7 - V)AW || = || (A - A)W\\ < ||( A - M)W\\ 

for all M £ C nXn such that MfCk C K,k- 
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3. The Ritz-pairs (x,0) and the minimum value of\\(I - V)AW\\ are independent of the 
choice of orthonormal basis W . 

Item (1 ) follows immediately from the Galerkin condition since it implies that 0 = W H (AW y— 
WyO ) = By — yd. Item (2) is easily shown using invariance of || • || under unitary transfor- 
mations. Item (3) follows from the fact that V is an orthonormal basis for Kk if and only if 
V — WQ for some k x k unitary matrix Q. With this change of basis A ~ V H V H , where 
H = V H AY — Q H BQ. Since H is unitarilv similar to B , the Ritz- values remain the same 
and the Ritz-vectors are of the form x = Wy — Vy , where y = Q H y- 

These facts are actually valid for any k dimensional subspace S in place of /C*. The 
following properties are consequences of the fact that every w € Kk is of the form w — 
<t>(A)vi for some polynomial <j> of degree less than k. 

Lemma 4 For the quantities defined above : 

1. If q is a polynomial of degree less than k then 

q(A) vi = q(A)vi = Wq(B)z u 

where v\ = Wz\, and if the degree of q is k then 


Vq{A)v i = q(A)vi. 


2. If p{ A) = det(XI - B ) is the characteristic polynomial of B then p(A) = 0 and 
||p(A)ui|| < ||g(A)ni|| for all monic polynomials of degree k. 

3 . If y is any vector in C k then AW y - WBy = 7 p{A)v\ for some scalar 7. 

4 . If (x,0) is any Ritz-pair for A with respect to Kk then 

Ax - x9 = 7p(A)ni 


for some scalar 7. 

This discussion follows the treatment given by Saad in (Saad, 1992) and in his earlier 
papers. While these facts may seem esoteric, they have important algorithmic consequences. 
First, it should be noted that /Ck is an invariant subspace for A if and only if 17 = Vy , 
where AV = V R with V H V = Ik and R is k x k upper triangular. Also, Kk is an invariant 
subspace for A if Vi = Xy , where X € c nxk and AX = XA with A diagonal. This follows 
from items (2) and (3) since there is a A:-degree monic polynomial q such that q(R) = 0 and 
hence ||j5(A)vi|| < ||g(A)t?i|| = \\Vq(R)y\\ = 0. (A similar argument holds when v\ = Xy). 

Secondly, there is some algorithmic motivation to seek a convenient orthonormal basis 
V — WQ that will provide a means to successively construct these basis vectors. It is 
possible to construct a kxk unitary Q using standard Householder transformations such that 
tq = V e\ and H — Q H BQ is upper Hessenberg with non-negative subdiagonal elements. 
It is also possible to show using item (3) that in this basis, 

AV = VH 4- /e£, where / = 7p( A)vi 



and V H f = 0 follows from the Galerkin condition. 

The first observation shows that if it is possible to obtain a v\ as a linear combination 
of k eigenvectors of A then / = 0 and V is an orthonormal basis for an invariant subspace 
of A , and that the Ritz values cr(H) C &{A) and corresponding Ritz vectors are eigenpairs 
for A. The second observation leads to the Lanczos/ Arnoldi process (Arnoldi, 1951 and 
Lanczos, 1950). 

4. The Arnoldi Factorization 

Definition : If A G C nxn then a relation of the form 

AV k = V k H k + f k ej , 

where V k € C nx * has orthonormal columns, V k f k = 0 and H k € C kxk is upper Hessenberg 
with non-negative subdiagonal elements is called a k-step Arnoldi Factorization of A. If A 
is Hermitian then Hk is real, symmetric and tridiagonal and the relation is called a k-step 
Lanczos Factorization of A. The columns of V* are referred to as the Arnoldi vectors or 
Lanczos vectors respectively. 

The development of this factorization has been purely through the consequences of the 
orthogonal projection imposed by the Galerkin conditions. A more straightforward but less 
illuminating derivation is to simply truncate the reduction of A to Hessenberg form that 
precedes the implicitly shifted QR-iteration by equating the first k columns on both sides 
of the complete reduction AV = VH. An alternative way to write this factorization is 

AV k = (V k ,v k+ i) ^ j where (3 k = ||/*|| and v k+1 - ^-f k . 

This factorization may be used to obtain approximate solutions to a linear system Ax = b 
if b = v\(5 0 . The purpose here is to investigate the use of this factorization to obtain ap- 
proximate eigenvalues and eigenvectors. The discussion of the previous section implies that 
Ritz pairs satisfying the Galerkin condition are immediately available from the eigenpairs 
of the small projected matrix H . 

If HkV = y0 then the vector x = \\y satisfies 

II Ax - x6\\ = | \(AV k - V k H k )y\\ = \3 k e T k y\. 

The number \(3 k e-[y\ is called the Ritz estimate for this the Ritz pair (x,9) as an approxi- 
mate eigenpair for A . Observe that if (x, 9) is a Ritz pair then 

e = y H H k y = ( V k yfA(V k y ) = x H Ax 

is a Rayleigh Quotient (assuming ||y|| = 1) and the associated Rayleigh Quotient residual 
r(x) = Ax — x9 satisfies 

IK*)II = \0k€ k y\. 

When A is Hermitian, this relation may be used to provide computable rigorous bounds on 
the accuracy of the eigenvalues of H as approximations to eigenvalues of A; see (Parlett, 
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1980). When A is non-Hermitian the possibility of non-normality precludes such bounds 
and one can only say that the RQ-residual is small if l/3kefyj is small. However, in either 
case, if f k = 0 these the Ritz pairs become exact eigenpairs of A. 

This factorization may be advanced one step at the cost of a (sparse) matrix-vector 
product involving .4 and two dense matrix vector products involving Vj[ and The 
explicit steps needed to form a fc-Step Arnoldi factorization are shown in Algorithm 4. 

Algorithm 4: The k- Step Arnoldi Factorization 


Input: (A,v) 

Put t»i = t>/||v||; w = Av i;<*i = rf* w ; 

Put f\ w — vjai ; V (rj); H — (oi); 

For j = 1 , 2, 3 

(a4.1) = ||/, ||; Vj + 1 *— 

( Hj 

(a4.2) Vj + i d ( 

\ h e J 

(a4.3) z <— Avj+ 1 ; 

(a4.4) h — V/,; / ; +i — z — 

(a4.5) i /,41 

End.For 

In exact arithmetic, the columns of V form an orthonormal basis for the Krylov subspace 
and H is the orthogonal projection of A onto this space. In finite precision arithmetic, care 
must be taken to assure that the computed vectors are orthogonal to working precision. 
The method proposed by Daniel, Gragg, Kaufman and Stewart (DGKS) in (Daniel et al., 
1976) provides an excellent way to construct a vector /j+i that is numerically orthogonal 
to Vj+\- It amounts to computing a correction 

s = Vj+j/j+i; /j+i fj+i “ h h + 6; 

just after Step (a4.4) if necessary. A simple test can be devised to avoid this DGKS correc- 
tion if it is not needed. 

The dense matrix- vector products at Step (a4.4) and also the correction may be ac- 
complished using Level 2 BLAS. This is quite important for performance on vector, and 
parallel-vector supercomputers. The BLAS operation -GEMV is easily parallelized and 
vectorized and has a much better ratio of floating point computation to data movement 
(Dongarra et al., 1988 and Dongarra et al., 1991). The Modified Gram-Schmidt Process 
(MGS) is often used in the construction of Arnoldi factorizations. However, MGS will defi- 
nitely not produce numerically orthogonal basis vectors in practice. Moreover, MGS cannot 
be formulated in terms of Level 2 BLAS unless all of the vectors to be orthogonalized are 
known in advance and this is not the case in the Arnoldi process. For these reasons, classical 
Gram-Schmidt orthogonalization with the DGKS correction step is highly recommended. 

The information obtained through this process is completely determined by the choice 
of the starting vector. Eigen-information of interest may not appear until k gets very large. 
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In this case it becomes intractable to maintain numerical orthogonality of the basis vectors 
Vjt- Moreover, extensive storage will be required and repeatedly finding the eigensystem of 
H will become prohibitive at a cost of 0{k 3 ) flops. 

Failure to maintain orthogonality leads to several numerical difficulties. In a certain 
sense, the computation (or approximation) of the projection indicated at Step (a4.4) in a 
way that overcomes these difficulties has been the main source of research activity in these 
Krylov subspace projection methods. The computational difficulty stems from the fact that 
1 1 fk 1 1 = 0 if and only if the columns of F* span an invariant subspace of A. When Vjt “nearly” 
spans such a subspace ||/*|| will be small. Typically, in this situation, a loss of significant 
digits will take place at Step (a4.4) through numerical cancellation unless special care is 
taken (i.e., use of the DGKS correction). 

It is desirable for ||/*|| to become small because this indicates that the eigenvalues of 
H are accurate approximations to the eigenvalues of A. However, this “convergence” will 
indicate a probable loss of numerical orthogonality in V. Moreover, if subsequent Arnoldi 
vectors are not forced to be orthogonal to the converged ones then components along these 
directions re-enter the basis via round-off effects and quickly cause a spurious copy of the 
previously computed eigenvalue to appear repeatedly in the spectrum of the projected 
matrix H . The identification of this phenomenon in the symmetric case and the first rigorous 
numerical treatment is due to Paige (1971). There have been several approaches to overcome 
this problem in the symmetric case. They include: (1) complete re-orthogonalization, which 
may be accomplished through maintaining V in product Householder form (W r alker, 1988) 
or through the Modified Gram-Schmidt processes with re-orthogonalization ( Daniel et al., 
1976). (2) Selective re-orthogonalization, which has been proposed by Parlett and has been 
heavily researched by him and his students. Most notably, the theses and subsequent papers 
and computer codes of Scott and of Simon have developed this idea (Parlett and Scott, 1979, 
Parlett, 1980, and Simon, 1984). (3) No re-orthogonalization, which has been developed 
by Cullum and her colleagues. This last option introduces the almost certain possibility 
of introducing spurious eigenvalues. Various techniques have been developed to detect and 
deal with the presence of spurious eigenvalues (Cullum, 1978 and Cullum and Willoughby, 
1981). 

The appearance of spurious eigenvalues may be avoided through complete orthogonal- 
ization of the Arnoldi (or Lanczos) vectors using the DGKS correction. Computational cost 
has been cited as the reason for not employing this option. However, the cost will be rea- 
sonable if one is able to fix k at a modest size and then update the starting vector v\ = Vk€\ 
while repeatedly doing fc-Arnoldi steps. This approach was introduced in (Karush, 1951) 
and developed further by (Cullum and Donath, 1974) for the symmetric case. Saad (1980, 
1984, and 1992) has developed explicit restarting for the nonsymmetric case. Restarting has 
proven to have important consequences for the development of numerical software based 
upon ArnoldFs method and this will be explored in the following section. 

5. Restarting the Arnoldi Method 

An unfortunate aspect of the Lanczos/ Arnoldi process is that one cannot know in ad- 
vance how many steps will be required before eigenvalues of interest are well approximated 
by Ritz values. This is particularly true when the problem has a wide range of eigenvalues 
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but the eigenvalues of interest are clustered. For example, in computational chemistry, 
problems are usually symmetric and positive definite and there is a wide range of eigenval- 
ues varying over many orders of magnitude. Only the smallest eigenvalues are physically 
interesting and they are typically clustered at the low end of the spectrum. Shift and invert 
is usually not an option because of fill in from the factorizations. Without a spectral trans- 
formation. many Lanczos steps are required to obtain the smallest eigenvalues. In order to 
recover eigenvectors, one is obliged to store all of the Lanczos basis vectors (usually on a 
peripheral device) and to solve very large tridiagonal eigenvalue subproblems at each step. 
In the Arnoldi process that is used in the non-Hermitian case, not only do the basis vectors 
have to be stored, but the cost of the Hessenberg eigenvalue subproblem is 0(k 3 ) at the 
k - th step. 

5.1. Explicit restarting 

An alternative has been proposed by Saad based upon the polynomial acceleration 
scheme developed in (Manteuffel, 1978) for the iterative solution of linear systems. Saad 
(1984) proposed to restart the iteration with a vector that has been preconditioned so that 
it is more nearly in a /c-dimensional invariant subspace of interest. This preconditioning 
takes the form of a polynomial applied to the starting vector that is constructed to damp 
unwanted components from the eigenvector expansion. The resulting algorithm takes the 
form: 

Algorithm 5: An Explicitly Restarted Arnoldi Method 


Input: (A,v) 

Put in = v/IMI; 

For j = l r 2, 3, ... until convergence 

(a.5.1) Compute an m-step Arnoldi factorization 
AVm = V m H m + f me m with V m ei = Vi ; 

(a5.2) Compute a(Hm) and corresponding Ritz estimates 

and halt if desired eigenvalues are well approximated. 

(a5.3) Construct a polynomial ip based upon 
to damp unwanted components. 

(a5.4) vi — ip(A)v i; v\ — t>i/||iu|| ; 

End_For 

The construction of the polynomial at Step (a5.3) may be guided by a priori information 
about the spectrum of A or solely by information gleaned from a(H m ). A typical scheme is 
to sort the spectrum of H m into two disjoint sets Q w and fl u , with o{H m ) — Q w U The 
Ritz values in the set Sl w are to be regarded as approximations to the “wanted” eigenvalues 
of A and an open convex set C u containing f l u with Sl w f\C u = 0 is to be regarded as a region 
that approximately encloses the “unwanted” portion of the spectrum of A. The polynomial 
p is then constructed to be as small in magnitude as possible on C u when normalized, for 
example, to take the value 1 at an element of closest to dC u . Chebyshev polynomials are 


11 




appropriate when C u is taken to be an ellipse and this was the original proposal of Saad when 
he adapted the ManteufFel idea to eigenvalue calculations. Another possibility explored by 
Saad has been to take C u to be the convex hull of f l u and to construct the polynomial xp that 
best approximates 0 on this set in the least squares sense. Both of these are based upon 
well-known theory of polynomial approximation. The problem of constructing an optimal 
ellipse for this problem has been studied by Chatelin and Ho. The reader is referred to 
(Chatelin and Ho, 1990) for details of constructing these polynomials. 

The reasoning behind this type of algorithm is that that if V\ is a linear combination of 
precisely k eigenvectors of A then Arnoldi factorization terminates in k steps (i.e., /* = 0). 
The columns of V* will form an orthonormal basis for the invariant subspace spanned by 
those eigenvectors, and the Ritz values cr{Hk) will be the corresponding eigenvalues of A. 
The update of the starting vector v\ is designed to enhance the components of this vector 
in the directions of the wanted eigenvectors and damp its components in the unwanted 
directions. This effect is achieved at Step (a5.4) since 

n n 

t ; i = £*,7; =► ${A)v x - £x j ^( A j ) 7 j . 
j=l j= 1 

If the same polynomial were applied each time, then after M iterations, the j-th original 
expansion coefficient would be essentially attenuated by a factor 

W(Ai )J ’ 

where the eigenvalues have been ordered according decreasing values |^(Aj))|. The eigen- 
values inside the region C u become less and less significant as the iteration proceeds. Hence, 
the wanted eigenvalues are approximated increasingly well as the iteration proceeds. 

Another restarting strategy proposed by Saad is to replace the starting vector with a 
linear combination of Ritz vectors corresponding to wanted Ritz values. If the eigenvalues 
and corresponding vectors are re-indexed so that the first k are wanted and ( Xj , 0j ) is the 
the Ritz pair approximating the eigenpair then 

v t £*j7j (4) 

j = 1 

is taken as the new r starting vector. Again, the motivation here is that the Arnoldi residual 
fk would vanish if these k Ritz vectors were actually eigenvectors of A and the Ritz vectors 
are the best available approximations to these eigenvectors. A heuristic choice for the 
coefficients 7 j has also been suggested by Saad (1980). It is to weight the j - th Ritz vector 
with the value of its Ritz estimate and then normalize so that the new starting vector 
has norm 1. This has the effect of favoring the Ritz vectors that have least converged. 
Additional aspects of explicit restarting are developed thoroughly in Chapter VII of (Saad, 
1992). In any case, this restarting mechanism is actually polynomial restarting in disguise. 
Since ij € IC m (A,v 1) implies xj = <f>j(A)v 1 for some polynomial 4>j the formula for v+ in 
(4) is of the form 

k 

V* «- <KA)v\ = £7,<M^i- ( 5 ) 

j=i 
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The technique just described is referred to as explicit (polynomial) restarting. When 
Chebyshev polynomials are used it is called an Arnoldi-Chebyshev method. The cost in 
terms of matrix-vector products w — Av is M * (m + deg(ip)) for M major iterations. The 
cost of the arithmetic in the Arnoldi factorization is M * (2 n * m 2 + 0(m 3 )) Flops (floating 
point operations). Tradeoffs must be made in terms of cost of the Arnoldi factorization vs. 
cost of the matrix-vector products Av and also in terms of storage ( nm + 0(m 2 )). 

5.2. Implicit restarting 

There is another approach to restarting that offers a more efficient and numerically 
stable formulation. This approach, called implicit restarting , is a technique for combining 
the implicitly shifted QR mechanism with a k - step Arnoldi or Lanczos factorization to 
obtain a truncated form of the implicitly shifted QR-iteration. The numerical difficulties 
and storage problems normally associated with Arnoldi and Lanczos processes are avoided. 
The algorithm is capable of computing a few ( k ) eigenvalues with user specified features such 
as largest real part or largest magnitude using 2 nk + 0(k 2 ) storage. No auxiliary storage 
is required. The computed Schur basis vectors for the desired fc-dimensional eigenspace are 
numerically orthogonal to working precision. This method is well suited to the development 
of mathematical software and this will be discussed in Section 7. 

Implicit restarting provides a means to extract interesting information from very large 
Krylov subspaces while avoiding the storage and numerical difficulties associated with the 
standard approach. It does this by continually compressing the interesting information into 
a fixed size A;-dimensional subspace. This is accomplished through the implicitly shifted QR 
mechanism. An Arnoldi factorization of length m — k + p 

AV m = V m H m + / m e£, (6) 

is compressed to a factorization of length k that retains the eigen-information of interest. 
This is accomplished using QR steps to apply p shifts implicitly. The first stage of this shift 
process results in 

AV+ = V+H+ + f m e T m Q , (7) 

where V+ = V m Q, H+ = Q T H m Q, and Q = Q\Qi • ■ -Q p , with Qj the orthogonal matrix 
associated with the shift pj. It may be shown that the first k - 1 entries of the vector e^Q 
are zero (i.e. e^Q = (ae^,q T ) ). Equating the first k columns on both sides yields an 
updated A;— step Arnoldi factorization 

AV+ = V+H+ + ^ e T k , (8) 

with an updated residual of the form f£ = Vj^ p ek+i$k + A+p* 7 - Using this as a starting 
point it is possible to apply p additional steps of the Arnoldi process to return to the original 
ra-step form. 

Each of these shift cycles results in the implicit application of a polynomial in A of 
degree p to the starting vector. 

p 

Vi il>(A)v i with = IK* - p 3 ). 

i 
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The roots of this polynomial are the shifts used in the QR process and these may be 
selected to filter unwanted information from the starting vector and hence from the Arnoldi 
factorization. Full details may be found in (Sorensen, 1992). The basic iteration is given 
here in Algorithm 6 and the diagrams in Figures 1-3 describe how this iteration proceeds 
schematically. In Algorithm 6 and in the discussion below, the notation denotes 

the leading n x k submatrix of M . 





Figure 1: Representation of \'k+ p Hk+p + fk+p f l+ p - Shaded regions denote 
nonzeros . 




Figure 2: 1 \+ P QQ T Hk+ P Q + fk+ptI+ P Q after p implicitly shifted QR steps. 
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Figure 11: Leading k columns 4- fkfl form a length k Arnoldi factor- 
ization after discarding the last p columns. 
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Observe that if m = n then / = 0 and this iteration is precisely the same as the Implicitly 
Shifted QR iteration. Even for m < n, the first k columns of V and the Hessenberg 
submatrix H( ; >t) are mathematically equivalent to the matrices that would appear in the 
full Implicitly Shifted QR iteration using the same shifts fij. In this sense, the Implicitly 
Restarted Arnoldi method may be viewed as a truncation of the Implicitly Shifted QR 
iteration. The fundamental difference is that the standard Implicitly Shifted QR iteration 
selects shifts to drive subdiagonal elements of H to zero from the bottom up while the 
shift selection in the Implicitly Restarted Arnoldi method is made to drive subdiagonal 
elements of H to zero from the top down. Important implementation details concerning the 
deflation (setting to zero) of subdiagonal elements of H and the purging of unwanted but 
converged Ritz values are beyond the scope of this discussion. However, these details are 
extremely important to the success of this iteration in difficult cases. Complete details of 
these numerical refinements may be found in (Lehoucq, 1995 and Lehoucq and Sorensen, 
1994). 

Algorithm 6: An Implicitly Restarted Arnoldi Method 

Input: (A V y H . /) with AV m = VmH m + /mem, 

(an m-Step Arnoldi Factorization); 

For £=1,2, 3, ... until convergence 

(a6.2) Compute cr(Hm) and select set of p shifts fi\ y p2>'<-Pp 
based upon cr(H m ) or perhaps other information; 

(a6.3) q T -el; 

(a6.4) For j = 1,2, 

Factor [Qj , Rj] = q r(H m — /t,/); 

Hm - QfHmQj ; V m - V m Qy 

9 - 9 * 0 ,; 

End.For 

(a6.5) fk Vk+lPk + Vk — Kn(j ; n,l:fc); Hk — :k), 

(a6.6) Beginning with the k-step Arnoldi factorization 

A Vk — \'kHk + fk£k 5 

apply p additional steps of the Arnoldi process 
to obtain a new m-step Arnoldi factorization 

A V m = V m i/ m /mCm • 

End.For 

The above iteration can be used to apply any known polynomial restart. If the roots 
of the polynomial are not known there is an alternative implementation that only requires 
one to compute q\ = ip(H)e i, where xp is the desired degree p polynomial. A sequence of 
Householder transformations may developed to form a unitary matrix Q such that Qe i = qi 
and H Q H HQ is upper Hessenberg. The details which follow standard developments for 

the Implicitly Shifted QR iteration will be omitted here. 
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A shift selection strategy that has proved successful in practice is called the “Exact Shift 
Strategy”. In this strategy, one computes a(H) and sorts this into two disjoint sets Q w and 
fi u . The k Ritz values in the set are regarded as approximations to the “wanted” 
eigenvalues of A, and the p Ritz values in the set Q u are taken as the shifts pj. An 
interesting consequence (in exact arithmetic) is that after Step (a6.4) above, the spectrum 
of H k in Step (a6.5) is cr(Hk) = Q w and the updated starting vector Vi is a particular linear 
combination of the k Ritz vectors associated with these Ritz values. In other words, the 
implicit restarting scheme with exact shifts provides a specific selection of the coefficients 7j 
in Eq. (4) and this implicit scheme costs p rather than the k + p matrix- vector products the 
explicit scheme would require. Thus the exact shift strategy can be viewed both as a means 
to damp unwanted components from the starting vector and also as directly forcing the 
starting vector to be a linear combination of wanted eigenvectors. The exact shift strategy 
has two additional interesting theoretical properties. 

Lemma 5 If H is unreduced and diagonalizable then: 

1. The polynomial <p in (5) satisfies <p( A) = -0(A)/c>(A), 

where rb is the exact shift polynomial and p is some 
polynomial of degree at most k — 1. 

2. The updated Krylov subspace generated by the new 
starting vector satisfies 

IC m (A, vf) = Span{xi,x 2 , ■ • ■ ,&k, Axj, A 2 £j , • • ■ , A p £j} 
for j = 1,2, •■■,k. 

The first property <t>{ A) = ^(A)p(A) indicates that the linear combination selected by the 
exact shift scheme is somehow minimal while the second property indicates that each of the 
subspaces K, p {A,x 3 ) C K m {A,v f ) so that each sequence of “wanted” Ritz vectors is rep- 
resented equally in the updated subspace. The first property was established in (Lehoucq, 
1995) along with an extensive analysis of the numerical properties of implicit restarting. The 
surprising second property was established in (Morgan, 1996), along with some compelling 
numerical results indicating superior performance of implicit over explicit restarting. 

6. The Generalized Eigenvalue Problem 

A typical source of large-scale eigenproblems is through a discrete form of a contin- 
uous problem. The resulting finite-dimensional problems become large due to accuracy 
requirements and spatial dimensionality. Typically this takes the form 


Cu = u\ in fl, (9) 

u satisfies B on dQ , 

where C is some linear differential operator. A number of techniques may be used to 
discretize C. The finite element method provides an elegant discretization. If >V is a space 
of functions in which the solution to (9) may be found and W„ C W is an n-dimensional 
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subspace with basis functions {4>j} then an approximate solution u n is expanded in the 
form 

n 

u n — 2 \ 0j£.r 

i=i 

A variational or a Galerkin principle is applied depending on whether or not £ is self-adjoint, 
leading to a weak form of (9 ) 

A(v, u) = A < v, u >, (10) 

where A(v, u) is a bilinear form. Substituting the expanded form of u = u n and requiring 
(10) to hold for each trial function v = fa gives a set of algebraic equations 


i=l 

where < -, • > is an inner product in W n 


— a < fa , 'y ' O] >, 

j = i 

This leads to the following systems of equations 


^ ’ A(<pj , <f>j )(,j — A 'y < <£> l , (fij > 

j=i j = i 


for 1 < i < n. We may rewrite (11) and obtain the matrix equation 


( 11 ) 


Ax - A Mx, 


where 

A ; ^ ~ A(<t>i,4>j), Mi t j = < <f>i,(f>j > , x t 

for 1 < t, j < n. Typically the basis functions are chosen so that few entries in a row of 
A or M are nonzero. In structures problems A is called the “stiffness” matrix and M is 
called the “mass" matrix. In chemistry and physics M is often referred to as the “overlap” 
matrix. A nice feature of this approach to discretization is that if the basis functions 4> 3 all 
individually satisfy B on d£l then the boundary conditions are naturally incorporated into 
the discrete problem. Moreover, in the self-adjoint case, the Rayleigh principle is preserved 
from the continuous to the discrete problem. In particular, since Ritz values are Rayleigh 
quotients, this assures the smallest Ritz value is greater than the smallest eigenvalue of the 
original problem. 

Thus, it is natural for large-scale eigenproblems to arise as generalized rather than 
standard problems. If £ is self-adjoint the discrete problems are symmetric or Hermitian 
and if not the matrix A is nonsymmetric but the matrix M is symmetric and at least 
positive semi- definite. There are a number of ways to convert the generalized problem to 
standard form. There is always motivation to preserve symmetry when it is present. 

If M is positive definite then there exists a factorization M = LL T , and the eigenvalues 
of A = L~ 1 AL~ t are the eigenvalues of (A,M), and the eigenvectors are obtained by 
solving L t x = x, where x is an eigenvector of A. This standard transformation is fine if one 
wants the eigenvalues of largest magnitude and it preserves symmetry if A is symmetric. 
However, when M is ill-conditioned this can be a dangerous transformation leading to 
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numerical difficulties. Since a matrix factorization will have to be done anyway, one may 
as well formulate a spectral transformation. 


6.1. Structure of the spectral transformation 

A convenient way to provide a spectral transformation is to note that 

Ax - XMx <=> (A — fxM)x = (A - fi)Mx 


Thus 

(A- fj,M)~ l Mx = x0 , where 6 - . 

A — j.1 

If A is symmetric then one can maintain symmetry in the Arnoldi/Lanczos process by 
taking the inner product to be 

< x, y >= x T My. 

It is easy to verify that the operator ( A - is symmetric with respect to this inner 

product if A is symmetric. In the Arnoldi/Lanczos process the matrix- vector product w «— 
Av is replaced by w {A — Mv and the step h — V T f is replaced by h <— V T (M f). 

If A is symmetric then the matrix H is symmetric and tridiagonal. Moreover, this process 
is well defined even when M is singular and this can have important consequences even if 
A is nonsymmetric. We shall refer to this process as the M-Arnoldi process. 

If M is singular then the operator 5 = (A - M has a nontrivial null space and 

the bilinear function < x,y >= x T My is a semi-inner product and ||x||m =< x,y > 1/2 is a 
semi-norm. Since (A — fiM) is assumed to be nonsingular, A f =Null(S ) =Null(M). Vectors 
in N are generalized eigenvectors corresponding to infinite eigenvalues. Typically, one is 
only interested in the finite eigenvalues of (A, M ) and these will correspond to the nonzero 
eigenvalues of 5. The invariant subspace corresponding to these nonzero eigenvalues is 
easily corrupted by components of vectors from Af during the Arnoldi process. However, 
using the M-Arnoldi process with some refinements can provide a solution. 

In order to better understand the situation, it is convenient to note that since M is 
positive semi-definite, there is an orthogonal matrix Q such that 


M = Q 


D 0 
0 0 



where D is a positive definite diagonal matrix of order n, say. Thus 

51 0 ' 

5 2 0 J ’ 

where 5i is a square matrix of order n and 52 is an m x n matrix with the original A, M 
being of order m + n. Observe now that a nonzero eigenvalue A of 5 satisfies Sx — x A , i.e. 


S\X i 


xjA 

S 2 x i 


x 2 A 


so that x 2 = x5 2 .Ti must hold. Note also that for any eigenvector x H = (x^,x^), the 
leading vector Xi must be an eigenvector of Si- Since 5 is block triangular, <r(5) = a(5i)U 


S s Q t SQ = 
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<7(O m ). Assuming S 2 has full rank, it follows that if S j has a zero eigenvalue then there is no 
corresponding eigenvector (since 5 * 2 X 1 = 0 would be. implied). Thus if zero is an eigenvalue 
of 5i with algebraic multiplicity m 0 then zero is an eigenvalue of S of algebraic multiplicity 
m + m 0 and with geometric multiplicity m. Of course, since, 5 is similar to S all of these 
statements hold for S as well. 


6.2. Eigenvector/ null-space purification 

With these observations in hand, it is possible to see the virtue of using M-Arnoldi on 
S. After A:-steps of M-Arnoldi, 

SV = VH + fej with V t MV = I k , V T M f = 0. 


Introducing the similarity transformation Q gives 

SV = VH + fej with V t Q t MQV = I k , V T Q T MQ f = 0, 

where V = Q T V and / = Q J f . Partitioning V T = {V^Vj) and f T — fj) consistent 
with the blocking of S gives 

SiVi = V\ H + he T k with V?D\\ = I k ,V?Dh = 0. 


Moreover, the side condition 52^1 = V^H + holds, so that in exact arithmetic a zero 
eigenvalue should not appear as a converged Ritz value of H . This argument shows that 
M-Arnoldi on 5 is at the same time doing D-Arnoldi on S \ while avoiding convergence to 
zero eigenvalues. 

Round-off error due to finite precision arithmetic will cloud the situation, as usual. It 
is clear that the goal is to prevent components in j\f from corrupting the vectors V . Thus, 
to begin, the starting vector v\ should be of the form v\ — Sv. If a final approximate 
eigenvector x has components in they may be purged by replacing x Sx and then 


normalizing. To see the effect of this, note that if x = Q 


x\ 

x 2 


then Sx — Q 


S 1X1 

S 2 x\ 


and all components in A f which are of the form Q 


0 

V 


will have been purged. This 


final application of S may be done implicitly in two ways. One is to note that if x — Vy 
with Hy = yd then Sx = VHy + fejy = xO + and this is the correction suggested 

by (Nour-Omid et ah, 1987). Another recent suggestion due to Meerbergen and Spence 
is to use implicit restarting with a zero shift (Meerbergen and Spence, 1995). Recall that 
implicit restarting with t zero shifts is equivalent to starting the M-Arnoldi process with a 
starting vector of S*v i and all the resulting Ritz vectors will be multiplied by S 1 as well. 
After applying the implicit shifts to H , the leading submatrix of order k — t will provide the 
updated Ritz values. No additional explicit matrix- vector products with S are required. 

The ability to apply l zero shifts (i.e., to multiply by S * implicitly) is very important 
when S\ has zero eigenvalues. If S\X\ — 0 then 


' Si 

0 ' 

*1 


0 

. 52 

0 

. x 2 . 


S 2 £i 


€ A/\ 
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Thus to completely eradicate components from Af one must multiply by S e , where £ is equal 
to the dimension of the largest Jordan block corresponding to a zero eigenvalue of Si. 

Spectral transformations were studied extensively by Ericsson and Ruhe (1980) and the 
first eigenvector purification strategy was developed in (Nour-Omid et al., 1987). Shift and 
invert techniques play an essential role in the block Lanczos code developed by Grimes, 
Lewis, and Simon. The many nuances of this technique in practical applications are dis- 
cussed thoroughly in (Grimes et aL, 1994). The development presented here and the eigen- 
vector purification through implicit restarting is due to Meerbergen and Spence (1995). 


6.3. An example 

This discussion is illustrated with the following example. 


.4 = 


K C 
C T 0 


and M = 


I 0 

0 0 ’ 


with A an order 325 matrix approximation to a convection-diffusion operator and C a 
structured random matrix. This example was chosen because it has the block structure of 
a typical steady-state Navier-Stokes linear stability analysis; see (Meerbergen and Spence, 
1995). The following MATLAB code was used to generate the example: 


rand( ’seed’ ,0) ; 
n = 225;m=100; 

K = lapc(n, 100) ; 

C = [rand(m,m) ; zeros (n-m,m)] ; 

M = [eye(n) zeros(n,m) ; zeros(m,n) zeros(m,m)]; 
A = [K C ; C’ zeros (m,m)] ; 
mu = 7.0; 

S = (A - mu*M)\M; 


The matrices K, C, M, A correspond to the matrices in the equations above. The function 
lapc computes a finite difference approximation to A u + pu T on a 15 x 15 regular grid in 
the unit square with p = 100. Any matrix pencil (A, M) with this block structure (assuming 
C has full rank and A - pM is nonsingular) will produce an 5 of the form 


5 = 


0 0 0 

0 5 22 o , 

S31 S32 0 


with the upper-left zero block of order m and with S 22 nonsingular and order n — m. From 
the above discussion one may conclude that 5 has an eigenvalue 0 with algebraic multiplicity 
2m and geometric multiplicity m. There are three important subspaces associated with 5. 
They are JV , Q and 72, and these spaces satisfy 


SA r = {0} , SG C AT , 572 c 72. 


All of C n may be represented as a direct sum of these three spaces. The (oblique) projectors 
associated with these spaces shall be denoted by P^r , Pg, and Pn respectively. Explicit 
formulas are: 
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\\PmV\\ 

3.70 


1.48(-11) 


\\PqV\\ \\M11 

1.32(-11) 2.85(-12) 


Table 1: Projection of V onto AT and Q 

3 \\Axj-Mxj\jW 1|( Ax, - Mxj\j)+\\ 

1 1.50(-03) 9.93(-06) 

2 l.llf-02) 6.77(-05) 

Table 2: Residuals before and after purging components from Af and Q 


P* 


Pg 


Pn 


0 0 0 

0 0 0 , 

0 —SziS 2 2 I 

1 0 0 ‘ 

0 0 0 , 

0 0 0 

0 0 0 ' 

0 522 

531 532 0 


Table 1 shows the norms of the projections of the basis vectors V onto the spaces Af and 
Q , where V was computed with 20 steps of M- Arnoldi starting with a vector Vi = Sv ( v a 
vector with all entries equal to 1). The norms of the projections are taken before and after 
purging by applying two zero shifts using implicit restarting. The “+” symbol denotes the 
updated basis after purging. 

Table 2 shows the residual norms for the two approximate eigenvalues that are closest 
to the shift // before and after purging. 

Clearly, there is considerable merit to doing this purging. This generalizes the purging 
proposed in (Nour-Omid et al., 1995) and seems to be quite promising. Further testing is 
needed but some form of this process is essential to the construction of numerical software 
to implement shift-invert strategies. 


7. Software, Performance, and Parallel Computation 

The Implicitly Restarted Arnoldi Method has been implemented and a package of For- 
tran 77 subroutines has been developed. This software, called ARPACK (Lehoucq et al., 
1994), provides several features which are not present in other codes based upon a single- 
vector Arnoldi process. One of the most important features from the software standpoint 
is the reverse communication interface. This feature provides a convenient way to interface 
with application codes without imposing a structure on the user’s matrix or the way a 
matrix-vector product is accomplished. In the parallel setting, this reverse communication 
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interface enables efficient memory and communication management for massively parallel 
MIMD and SIMD machines. The important features of ARPACK are: 

• A reverse communication interface. 

• Ability to return k eigenvalues that satisfy a user specified criterion, such as largest 
real part, largest absolute value, largest algebraic value (symmetric case), etc. 

• A fixed pre-determined storage requirement throughout the computation. Usually 
this is n * 0(2k) + 0(k 2 ), where k is the number of eigenvalues to be computed and 
n is the order of the matrix. No auxiliary storage or interaction with such devices is 
required during the course of the computation. 

• Eigenvectors computed on request. The Arnoldi basis of dimension k is always com- 
puted. The Arnoldi basis consists of vectors which are numerically orthogonal to 
working accuracy. Computed eigenvectors of symmetric matrices are also numerically 
orthogonal. 

• User-specified numerical accuracy of the computed eigenvalues and vectors. Residual 
tolerances may be set to the level of working precision. At working precision, the 
accuracy of the computed eigenvalues and vectors is consistent with the accuracy 
expected of a dense method such as the implicitly shifted QR iteration. 

• No theoretical or computational difficulty for multiple eigenvalues, other than addi- 
tional matrix- vector products required to expose the multiple instances. This is made 
possible through the implementation of deflation techniques similar to those employed 
to make the implicitly shifted QR-algorithm robust and practical. A block method is 
not required; hence, one does not need to “guess" the correct blocksize that would be 
needed to capture multiple eigenvalues. 


7.1. Reverse communication interface 

As mentioned above, the reverse communication interface is one of the most important 
aspects of the design of ARPACK. In the serial code, a typical usage of this interface is 
illustrated with the following example, where snaupd is an ARPACK module: 

10 continue 

call snaupd (ido, bmat, n, which,..., 

* V, .., work, info) 
if (ido . eq. newprod) then 

call matvec (’A*, n, workd(ipntr(l) ) , 

* workd(ipntr (2) ) ) 
else 

return 
endif 
go to 10 
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As usual, with reverse communication, control is returned to the calling program when 
interaction with the matrix A is required. The action requested of the calling program is 
to simply perform the action indicated by the reverse communication parameter ido (in 
this case, multiply the vector held in the array workd beginning at location ipntr(l) and 
put the result in the array workd beginning at location ipntr(2)). Note that call to the 
subroutine matvec in this code segment is simply meant to indicate that this matrix- vector 
operation is taking place. The user is free to use any available mechanism or subroutine to 
accomplish this task. In particular, no specific data structure is imposed and, indeed, no 
explicit representation of the matrix is even required. One only needs to supply the action 
of the matrix on the specified vector. 

There are several reasons for supplying this interface. It is more convenient to use with 
large application codes. The alternative is to put the user supplied matrix-vector product 
in a subroutine with a pre-specified calling sequence. This may be quite cumbersome and 
is especially so in those cases where the action of the matrix on a vector is known only 
through a lengthy computation that doesn’t involve the matrix A explicitly. Typically, if 
the matrix- vector product must be provided in the form of a subroutine with a fixed calling 
sequence, then named common or some other means must be used to pass data to the routine. 
This is incompatible with efficient memory management for massively parallel MIMD and 
SIMD machines. 

This has been implemented on a number of parallel machines including the CRAY-C90, 
Thinking Machines CM-200 and CM-5, Intel Delta, and CRAY T3D. Parallel performance 
on the C90 is obtained through the BLAS operations without any modification to the serial 
code. SIMD performance on the CM-200 is also relatively straightforward. All of the BLAS 
operations were expressed using Fortran90 array constructs and hence were automatically 
compiled for execution on the SIMD array instead of the front end. Operations on the 
projected matrix H were not encoded with these array constructs and hence were auto- 
matically scheduled for the front end. The only additional complication was to define the 
data layouts of the V array and the work arrays for efficient execution. In the distributed 
memory implementations, the reverse communication interface provided a natural way to 
parallelize the ARPACK codes internally without imposing a fixed parallel decomposition 
on the user supplied matrix- vector product. 

7.2. Data distribution and global operations 

The parallelization strategy for distributed memory machines consists of providing the 
user with a Single Program Multiple Data (SPMD) template. The array V is blocked 
and distributed across the processors. The projected matrix H is replicated. The SPMD 
program looks essentially like the serial code except that the local block Vloc is passed 
in place of V. The work space is partitioned consistently with the partition of V and 
each section of the work space is distributed to the node processors. Thus the SPMD 
parallel code looks very similar to that of the serial code. Assuming a parallel version of the 
subroutine matvec, an example of the application of the distributed interface is illustrated 
as the follows: 

10 continue 

call snaupd (ido, bmat, nloc, which, .... 
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* Vloc, . work, info) 
if (ido . eq. newprod) then 

call matvec (’A*, nloc, workd(ipntr(l) ) , 

* workd ( ipnt r ( 2 ) ) ) 
else 

return 
endif 
go to 10 

Where, nloc is the number of rows in the block Vloc of V that has been assigned to this 
node process. 

Typically, the blocking of V is commensurate with the parallel decomposition of the 
matrix A as well as with the configuration of the distributed memory and interconnection 
network. Logically, the V matrix be partitioned by blocks 

\ rT — K< 2 ) T , y{nproc) T j 

with one block per processor and with H replicated on each processor. 

The explicit steps of the process responsible for the j block are: 

1. 8 k = gnorm{f [ M) ); — fl^/0; 

2. V<i\ - (H, v M )<»: B„ ( fjr ) • 

3. z <- (Aloc)v k+ 1 ; 

4. h <- Vjfl z; h ~ gsum(h ( *>) f k+1 z - V k+1 h: 

5. H k+t ~-(H k ,h): 

The function gnorm at Step 1 is meant to represent the global reduction operation of 
computing the norm of the distributed vector from the norms of the local segments 
and the function gsum at Step 4 is meant to represent the global sum of the local 
vectors hS^ so that the quantity h = h ^ is available to each process on completion. 

These are the only two global communication points within this algorithm. The remainder 
is perfectly parallel. Additional communication will typically occur at Step 3. Here the 
operation (Aloc)v is meant to indicate that the user supplied matrix- vector product is able 
to compute the local segment of the matrix-vector product Av that is consistent with the 
partition of V\ Ideally, this would only involve nearest neighbor communication among the 
processes. 

Since H is replicated on each processor, the parallelization of the implicit restart mech- 
anism described by Algorithm 6 remains untouched. The only difference is that the local 
block takes the place of the full matrix V . All operations on the matrix H are repli- 
cated on each processor. Thus there is no communication overhead but there is a “serial 
bottleneck’' here due to the redundant work. If k is small relative to n, this bottleneck 
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is insignificant. However, it becomes a very important latency issue as k grows, and will 
prevent scalability if k grows with n as the problem size increases. 

The main benefit of this approach is that the changes to the serial version of ARPACK 
are very minimal. Since the change of dimension from matrix order n to its local distributed 
blocksize nloc is invoked through the calling sequence of the subroutine snaupd, there is 
no essential change to the code. Only six routines were effected, and these in a minimal 
way. These routines required either a change in norm calculation for distributed vectors 
(Step 1) or in the distributed dense matrix- vector product (Step 4). Since the vectors are 
distributed, norms had to be done via partial (scaled) dot products for the local vector 
segments and then a global sum operation was used to complete the sum of the squared 
norms of these segments on all processors. More specifically, the commands are changed 
from 

rnorro = sdot (n, resid, 1, workd, 1) 
rnorm = sqrt(abs(rnorm) ) 


to 


rnormO = sdot (n, resid, 1, workd, 1) 
call gs sum (rnormO , 1 .trap) 
rnormO = sqrt (abs (rnormO) ) 
rnorm = rnormO 

Similarly, the computation of the matrix-vector product operation h V T w requires a 
change from 

call sgemv (’T’, n, j, one, v, ldv, workd(ipj), 1, 

* zero, h(l , j) , 1) 


to 


call sgemv (’T’, n, j, one, v, ldv, workd(ipj), 1, 

* zero, h(l , j ) , 1) 

call gssum(h(l , j ) , j ,h(l , j+1) ) 

so the global sum operation gssum w r as sufficient to implement all of the global operations. 

7.3. Distributed memory parallel performance 

To get an idea of the potential performance of ARPACK on distributed memory ma- 
chines some examples have been run on the Intel Touchstone DELTA. The examples have 
been designed to test the performance of the software, the matrix structure, the Touchstone 
DELTA machine architecture, and the speedup behavior of the software on DELTA. 

The user’s implementation of the matrix-vector product w *- Av can have considerable 
effect upon the parallel performance. Moreover, there is a fundamental difficulty in testing 
how the performance scales as the problem size increases. The difficulty is that the prob- 
lem often becomes increasingly difficult to solve as the size increases due to clustering of 
eigenvalues. The tests reported here attempt to isolate and measure the performance of the 
parallelization of the ARPACK routines independently of the matrix-vector product. 
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Problem size Number of nodes Total Time (s) 


3000*1 

1 

22.96 

3000*2 

2 

23.22 

3000*4 

4 

23.98 

3000*8 

8 

24.08 

3000*16 

16 

24.39 

3000*32 

32 

24.95 

3000*64 

64 

25.50 

3000*128 

128 

27.13 

3000*256 

256 

28.65 


Table 3: Parallel ARPACK scaled speedup test on DELTA, matrix order 3,000 on each 
node 

In order to isolate the performance of the ARPACK routines from the performance of the 
user’s matrix- vector product and also to isolate effects of a changing problem characteristics 
as the size increases, a test was comprised of replicating the same matrix repeatedly to obtain 
a block diagonal matrix. Each diagonal block corresponds to a block of the partitioned and 
distributed matrix V . This is, of course, a completely contrived situation that allows the 
workload to increase linearly with the number of processors. Since the each diagonal block 
of the matrix is identical the algorithm should behave as if nproc identical problems are 
being solved simultaneously as long as the initial distributed segments of V\ are generated 
the same. Thus, the only things that could prevent ideal speedup are the communication 
involved in the global operations and the “serial bottleneck” associated with the replicated 
operations on the projected matrix H . If neither of these w'ere present then one would expect 
the execution time to remain constant as the problem size and the number of processors 
increase. 

In this first example, each diagonal block is of order 3,000, which is identical to the 
vector segment size on each node. The matrix-vector product operation «— (Aloc)v^ l 

is executed locally on each node processor upon the distributed vector segments , and 
there is no communication among processors involved in this operation. As described above, 
the problem size in increased linearly with the the number of processors by adjoining an 
additional identical diagonal block to the A matrix for each additional processor. The global 
sum operation gssum is essentially a ring algorithm and thus has a linear cost with respect 
to the number of nodes. Since the diagonal blocks are identical, the replicated operations 
on H should remain the same as the problem size increases and hence linear speedup is 
expected, i.e., as the problem size increases the execution time should remain constant. 
This ideal speedup is very nearly achieved, as reflected in Table 3. 

The second example is obtained from a similar numerical model of the eigenproblem of 
the Laplacian operator defined on the unit square with square with Dirichlet boundary con- 
ditions on three sides and a Neuman boundary condition on the fourth side. This leads to 
a mildly nonsymmetric matrix with the same 5-diagonal structure as the standard 2-D dis- 
crete Laplacian with a 5-point stencil. The unit square {(x, t/)|0 < x, y < 1} was discretized 
with x-direction mesh size l/(n + l) and y-direction mesh size l/(m+ 1), respectively. Thus 
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Problem size Number of nodes Total Time (s) 


2500*1 

1 

19.63 

2500*2 

2 

20.71 

2500*4 

4 

21.97 

2500*8 

8 

22.47 

2500*16 

16 

22.50 

2500*32 

32 

23.13 

2500*64 

64 

23.68 

2500*128 

128 

24.78 

2500*256 

256 

28.16 


Table 4: Parallel ARPACK scaled speedup test on DELTA, matrix order 2,500 on each 
node 

the matrix A is block tridiagonal and of order N = nm . The order of each diagonal block 
is rc, and the number of diagonal blocks is m . 

A natural way to carry out the matrix-vector product operation w Av is described 
as follows. A standard domain decomposition partitioning of the unit square into sub- 
rectangles leads to a parallel matrix-vector product that exchanges data only across the 
boundaries of the subdomains and hence needs only nearest neighbor connections. The 
subdomains are naturally chosen so that the blocking of the matrix is commensurate with 
the blocking and distribution of the V array. The reverse communication interface allows 
the user supplied matrix- vector product to take advantage of the matrix structure. Simple 
send and receive operations using the native Intel isend and irecv are used to carry out the 
nearest neighbor communication operation. 

The results of these tests are given in Table 4 and demonstrate nearly the same speedup 
as in Table 3. The relatively minor communication to receive boundary data from nearest 
neighbors slightly degraded the speedup. 

The final example shows how dramatically an inefficient matrix- vector product operation 
w — Av and also how problem size can effect performance. A naive way to perform the 
matrix- vector product would be to collect the segments of the vector v from all nodes before 
the operation, and then distribute the segments of the result vector w to each node after the 
operation. The performance of this scheme is shown in Table 5. No advantage of the matrix 
structure was taken in computing the matrix- vector product. The matrix size was fixed at 
n = 3,200. The parallel ARPACK software was then used to compute the eigenvalues and 
eigenvectors. A residual tolerance of (10“ 8 ) was imposed. 

Table 5 shows the total time and the number of iterations required to solve this fixed 
problem with a different number of processors. The number of iterations varied with dif- 
ferent processor configurations and this is attributed to different initial random vectors 
being generated as the number of processors changed. However, the corresponding result 
eigenvalues and eigenvectors are identical for all of the runs. 

The speedup caused by increasing the number of processors can be observed by checking 
the average run time per iterate for each individual test. The fourth column in Table 5, 
demonstrates deteriorated speedup after the number of processors exceeds 32. Column five 
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Nodes 

Time (s) 

Iters. 

Ave- T jZ e 

OP*x Time 
Total Time 

i 

1809.07 

173 

10.46 

0.84 % 

2 

1073.36 

189 

5.679 

1.48 % 

4 

732.72 

213 

3.440 

2.65 % 

8 

449.95 

225 

2.000 

5.24 % 

16 

201.27 

192 

1.048 

8.90 % 

32 

114.98 

154 

0.747 

13.3 % 

64 

161.24 

260 

0.620 

18.0 % 

128 

128.28 

210 

0.611 

25.9 % 


Table 5: Parallel ARPACK fixed-size speeedup test, matrix order 3,200 
shows that the reason for this deterioration lies with the inefficient matrix- vector product. 

7.4. General applications of ARPACK 

ARPACK has been used in a variety of challenging applications, and has proven to 
be useful both in symmetric and nonsymmetric problems. It is of particular interest when 
there is no opportunity to factor the matrix and employ a “shift and invert” form of spectral 
transformation, 

A<-{A-oI)~ l . (12) 

Existing codes often rely upon this transformation to enhance convergence. Extreme eigen- 
values {n} of the matrix A are found very rapidly with the Arnoldi/Lanczos process and 
the corresponding eigenvalues {A} of the original matrix A are recovered from the relation 
A = l/fi + cr. Implementation of this transformation generally requires a matrix factoriza- 
tion. In many important applications this is not possible due to storage requirements and 
computational costs. The implicit restarting technique used in ARPACK is often successful 
without this spectral transformation. 

One of the most important classes of application arise in computational fluid dynamics. 
Here the matrices are obtained through discretization of the Navier-Stokes equations. A typ- 
ical application involves linear stability analysis of steady state solutions. Here one linearizes 
the nonlinear equation about a steady state and studies the stability of this state through 
the examination of the spectrum. Usually this amounts to determining if the eigenvalues of 
the discrete operator lie in the left halfplane. Typically these are parametrically dependent 
problems; the analysis consists of determining phenomena such as simple bifurcation, Hopf 
bifurcation (an imaginary complex pair of eigenvalues cross the imaginary axis), turbulence, 
or vortex shedding as a parameter is varied. ARPACK is well suited to this setting as it is 
able to track a specified set of eigenvalues while they vary as functions of the parameter. 
Our software has been used to find the leading eigenvalues in a Couette-Taylor wavy vortex 
instability problem involving matrices of order 4,000. One interesting facet of this applica- 
tion is that the matrices are not available explicitly and are logically dense. The particular 
discretization provides efficient matrix-vector products through Fourier transform. Details 
may be found in (Edwards et ah. 1994). 
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Very large symmetric generalized eigenproblems arise in structural analysis. One ex- 
ample that we have worked with at Cray Research through the courtesy of Ford Motor 
Company involves an automobile engine model constructed from 3D solid elements. Here 
the interest is in a set of modes to allow solution of a forced frequency response problem 
( K — XM)x = f(t ), where f(t) is a cyclic forcing function which is used to simulate ex- 
panding gas loads in the engine cylinder as well as bearing loads from the piston connecting 
rods. This model has over 250,000 degrees of freedom. The smallest eigenvalues are of 
interest and the ARPACK code appears to be very competitive with the best commercially 
available codes on problems of this size. For details, see (Sorensen et al., 1993). 

The Singular Value Decomposition (SVD) may also be computed using ARPACK and 
possesses many large-scale applications. Two SVD applications occur in computational 
biology. The first of these is the 3-D image reconstruction of biological macromolecules 
from 2-D projections obtained through electron micrographs. The second is an application 
to molecular dynamical simulation of the motions of proteins. The SVD may be used to 
compress the data required to represent the simulation and more importantly to provide 
an analytical tool to help in understanding the function of the protein. See (Romo et al., 
1994) for further details of the molecular dynamics application. The underlying algorithm 
for reconstructing 3-D image reconstruction of biological macromolecules from 2-D pro- 
jections ( Van Heel and Frank, 1981) is based upon the statistical technique of principal 
component analysis (Van Ruffle and Vandewalle, 1991). In this algorithm, a singular value 
decomposition (SVD) of the data set is performed to extract the largest singular vectors, 
which are then used in a classification procedure. Our initial effort has been to replace the 
existing algorithm for computing the SVD with ARPACK which has increased the speed of 
the analysis by a factor of 7 on an Iris workstation. The accuracy of the results were also 
increased dramatically. Details are reported in (Feinswog et al., in preparation). 

Computational chemistry provides a rich source of problems. 

ARPACK is being used in two applications currently and holds pro- 
mise for a variety of challenging problems in this area. We are collaborating with researchers 
at Ohio State on large-scale three-dimensional reactive scattering problems. The governing 
equation is the Schroedinger equation and the computational technique for studying the 
physical phenomena relies upon repeated eigenanalysis of a Hamiltonian operator consisting 
of a Laplacian operator discretized in spherical co-ordinates plus a surface potential. The 
discrete operator has a tensor product structure from the discrete Laplacian plus a diagonal 
matrix from the potential. The resulting matrix has a block structure consisting of m x 
777 blocks of order n . The diagonal blocks are dense and the off diagonal blocks are 
scalar multiples of the order n identity matrix. It is virtually impossible to factor this 
matrix directly because the factors are dense in any ordering. We are using a distributed 
memory parallel version of ARPACK together with some preconditioning ideas to solve 
these problems on distributed memory machines. Encouraging computational results have 
been obtained on Cray Y-MP machines and also on the Intel Delta and the CM-5. The 
code has recently been ported to the CRAY T3D with very promising results. On a matrix 
of order 12,800, computing the smallest eight eigenvalues using a Chebyshev polynomial 
preconditioner of degree eight, the CRAY YMP executed at a rate of 290.66 Mflop/s while 
the T3D using the distributed-shared memory model executed at a peak rate of 1412 Mflop/s 
(See Table 6). For details about the method and experimental results, see (Pendergast et 
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Nprocs Mflop/s 


2 172.50 

4 322.03 

8 586.29 

16 1006.60 

32 1412.73 

Table 6: Parallel ARPACK fixed-size computation rate test on T3D Shared Memory, matrix 
order 12,800 


aL, 1994) and (Sorensen et al.. 1993). 

Nonsymmetric problems also arise in quantum chemistry. Researchers at University of 
Washington have used the code to investigate the effects of the electric field on InAs/GaSb 
and GaAs/Al r Gai_ x quantum wells. ARPACK was used to find highly accurate solutions 
to these nonsymmetric problems which defied solution by other means. See (Li and Kuhn, 
1993) for details. Researchers at University of Massachusetts have used ARPACK to solve 
eigenvalue problems arising in their FEM quantum well lip model for strained layer super- 
lattices (Baliga et al., 1994). 

A final example of nonsymmetric eigenproblems to be discussed here arises in magneto- 
hydrodynamics (MHD) involving the study of the interaction of a plasma and a magnetic 
field. The MHD equations describe the macroscopic behavior of the plasma in the magnetic 
field. These equations form a system of coupled nonlinear PDEs. Linear stability analysis 
of the linearized MHD equations leads to a complex eigenvalue problem. Researchers at 
the Institute for Plasma Physics and Utrecht University in the Netherlands have modified 
the codes in ARPACK to work in complex arithmetic and are using the resulting code to 
obtain very accurate approximations to the eigenvalues lying on the Alfven curve. The code 
is not only computes extremely accurate solutions, it does so very efficiently in comparison 
to other methods that have been tried. See (Kooper et al M 1993) for details. 

There are many other applications. It is hoped that the examples briefly mentioned here 
indicate the versatility of the ARPACK software as well as the wide variety of eigenvalue 
problems that arise. 

8. Conclusions 

This paper has attempted to give an overview of the numerical solution of large-scale 
eigenvalue problems. Basic theory and algorithms were introduced to motivate Krylov 
subspace projection methods. The focus has been on a particular variant, the Implicitly 
Restarted Arnoldi Method, which has been developed into a substantial software package, 
ARPACK. 

There are a number of competing methods that have not been discussed here in any 
detail. Two notable methods that have not been discussed are methods based on the non- 
symmetric two-sided Lanczos process and methods based upon subspace iteration. At this 
point, no single method appears to be viable for all problems. Certainly in the nonsym- 
metric case there is no “black box” technique and it is questionable that there is one in the 
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symmetric case either. A block method called ABLE based upon two-sided nonsymmetric 
Lanczos is being developed by Bai, Day and Ye (1995). Software based upon subspace iter- 
ation with Chbevchev acceleration has been developed by Duff and Scott (1993). Jennifer 
Scott has also developed software based upon an explicitly restarted Chebyshev-Arnoldi 
method (Scott. 1993). Finally, the Rational Krylov method being developed by Ruhe (1984 
and 1994) is very promising for the nonsymmetric problem when a factorization of the 
matrix is possible. 

The computational results presented in Section 7 are due to Zdenko Tomasic and Dan 
Hu. I would like to thank Rich Lehoucq for producing Figures 1-3 and for constructive 
comments and discussions about this work. 
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(administered by the U.S. Army Research Office), and by the National Science Foundation 
project ASC-9408795. 
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